Skip to content

refactor of calculateInterVsIntra to reduce complexity and increase p…#47

Closed
openpaul wants to merge 2 commits into
immcantation:masterfrom
openpaul:calculateInterVsIntra_refactor
Closed

refactor of calculateInterVsIntra to reduce complexity and increase p…#47
openpaul wants to merge 2 commits into
immcantation:masterfrom
openpaul:calculateInterVsIntra_refactor

Conversation

@openpaul
Copy link
Copy Markdown

@openpaul openpaul commented Apr 9, 2026

Hello,

After having looked into the pairwiseDist code (immcantation/alakazam#133) I had a look where in scoper this code is used.

I profiled the hierarchicalClones function with profvis and found that a lot of time is spend in the calculateInterVsIntra function, which is where the pairwiseDist function is called. Noticably the pairwiseDist call is not the most time consuming step though. From my trace I could see a lot of GC happening, but it did not directly show what calls caused this.

As such I tried multiple refactors of the code and came up with this solution.

The function has the same signature as before and produces identical output to the previous version. At least as much as my tests have shown me. Tbh the old code was a bit hard to understand, but I hope I got there in the end.

I tried to not use more memory, and keep the code simple. I still have for loops but I removed many paste0 calls, which I think is the cause for the speedup.

I ran a small dataset end to end with hierarchicalClones and see a drop from 35 seconds to 15 seconds compute time. In a larger test before the change it took 12 minutes, after only 1.1 minutes, which is a lot more than I expected. All the more reason to be critical of this PR.

I checked and the clone ids between both runs are identical, but I would appreciate a solid review of the code to make sure I did not introduce any bugs.

I think this function is also used by spectral clustering, but I did no test that path yet. That's why I decided to open this PR before as a draft for now. But thought the speedup is worth sharing. And I hoped that this is interesting enough to get feedback before I continue sinking time into this.

I hope I did not overlook any hidden assumptions of the previous code. The immcantation framework is quite cool but also very complex to disentangle as an outsider :)

All the best

@ssnn-airr ssnn-airr requested a review from ggabernet April 13, 2026 17:13
@openpaul
Copy link
Copy Markdown
Author

Closing for now as speed differences are inconsistent and I want to test some more.

@openpaul openpaul closed this Apr 13, 2026
@openpaul openpaul deleted the calculateInterVsIntra_refactor branch April 13, 2026 17:14
@openpaul openpaul restored the calculateInterVsIntra_refactor branch April 13, 2026 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant